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Background: Data collection by Electronic Medical Record (EMR) systems have been proven to be helpful in data 
collection for scientific research and in improving healthcare. For a multi-centre trial in Indonesia and the 
Netherlands a web based system was selected to enable all participating centres to easily access data. This study 
assesses whether the introduction of a Clinical Trial Data Management service (CTDMS) composed of electronic 
Case Report Forms (eCRF) can result in effective data collection and treatment monitoring. 

Methods: Data items entered were checked for inconsistencies automatically when submitted online. The data 
were divided into primary and secondary data items. We analysed both the total number of errors and the change 
in error rate, for both Primary and Secondary items, over the first five month of the trial. 

Results: In the first five months 51 patients were entered. The Primary data error rate was 1.6%, whilst that for 
Secondary data was 2.7% against acceptable error rates for analysis of 1% and 2.5% respectively. 

Conclusion: The presented analysis shows that after five months since the introduction of the CTDMS the Primary 
and Secondary data error rates reflect acceptable levels of data quality. Furthermore, these error rates were 
decreasing over time. The digital nature of the CTDMS, as well as the online availability of that data, gives fast and 
easy insight in adherence to treatment protocols. As such, the CTDMS can serve as a tool to train and educate 
medical doctors and can improve treatment protocols. 



Background 

Data collection concerning medical needs is required to 
assess the effectiveness of interventions and current 
health care practices [1]. Furthermore, data collection by 
Electronic Medical Record (EMR) systems has been pro- 
ven to be helpful in data collection for scientific 
research and can be helpful in improving healthcare. 
These EMR systems allow for the early identification of 
missing data and the patients possibly loss-to-follow-up, 
which is essential for the conduct of proper scientific 
research [2-6]. 
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A Clinical Trial Data Management service (CTDMS) 
has been introduced for running a multicenter clinical 
trial in Indonesia and in the Netherlands. The same sys- 
tem has also been introduced for monitoring treatment 
results of Nasopharyngeal Carcinoma (NPC) in 
Indonesia. 

In most countries NPC is an orphan disease, but over- 
all has a worldwide incidence of 80.0000 new cases per 
year, being endemic in Northern Africa, Southern China 
and Hong Kong, and the South-East Asian peninsula, 
including Malaysia, Vietnam, Thailand, Singapore and 
Indonesia. 

In Indonesia NPC is the most frequent cancer in the 
head and neck area and ranks as the 4 th most common 
tumour found in males. The incidence is estimated 6 
per 100.000, leading to 12.000 new cases per year [7,8]. 
Little is known about treatment results of NPC in 
Indonesia. 
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The CTDMS system was selected because of the web- 
based nature which makes the data approachable for all 
participating parties. This online accessible data system 
has made it easier for the principal investigator to check 
the data for inconsistencies. The senior physician can 
easily see if treatment is according protocol. 

This study assesses whether the introduction of 
CTDMS composed of online Case Report Forms (eCRF) 
can result in improved patient outcomes. The assess- 
ment focuses on data quality and the identification of 
possible bottle necks within the patient care process. 

This study investigates if a web based CTDMS can be 
helpful in proper data collection by analysing errors in 
data items. Bottle necks in patient care are analysed by 
comparison of treatment plan and actual treatment. 

Methods 

The CTDMS is constructed for the NPC Clinical Trial: 
Early detection of primary and recurrent NPC using 
(anti-)EBV based tumour markers and evaluation of pri- 
mary treatment for NPC (funding KWF NKI-2008- 
4233). A technical description of the CTDMS is pro- 
vided in Appendix 1. The database is comprised of 10 
online eCRF's. In order to prevent errors from being 
entered, data validation rules were implemented into the 
eCRF's prior to commencement of the NPC Clinical 
trial. These data validation rules assess whether certain 
pre-specified conditions are valid and can therefore pin- 
point omissions or erroneous data. Online warning mes- 
sages notify the data-manager (entering data) when 
errors are detected. Commonly used checks are, for 
instance, range checks that verify whether values are 
within the boundaries dictated by the study protocol, 
and mandatory field checks (i.e. 'This field cannot be 
blank'). 

Of the 10 eCRF's, 9 were required to be completed 
multiple times per patient during the study and only 1 
was to be completed and submitted once per patient. 
Each of these submissions is a unique realization of the 
form. For example, for one patient a laboratory form is 
completed during baseline measurements, just before 
the start of the treatment. Once this form is submitted 
through the CTDMS, there is one realization of the 
laboratory form stored in the database for this patient. 
After the patient received treatment, a laboratory form 
is completed and submitted again. The data base then 
contains two realizations of the laboratory form, for that 
patient. Each realization may be submitted multiple 
times if it contained errors. We note that it is impossi- 
ble to claim that an entered form for which no warning 
messages were displayed is clean, as new errors may be 
found later. 

The data-manager completing an eCRF has the option 
to ignore (override) a warning message, however in such 



cases, he/she is required to provide an explanation 
which is recorded in an Audit Trail entry field (error 
log). Warning messages and error logs are also created 
when an incorrect value or data-type is entered, an 
omission is detected, or when a previously entered value 
is changed. Changed values are considered to be (pre- 
viously undetected) errors that have now been rectified 
(except when the changed value also triggers a check to 
fire, in which case the data is considered unclean). 

The eCRF's contain differing quantities of data. Each 
field to be entered is considered a data item, which were 
designated either as primary or secondary. Primary data 
items are data that were considered essential for the 
assessment of the NPC Clinical trial primary endpoint, 
and so for assessment of treatment protocol. Secondary 
data items are data required to assess the clinical trial's 
secondary endpoints. 

As acceptable levels of data quality, an 1% error rate 
for primary and an 2.5% error rate for secondary data 
points were adopted [9]. We present the change in error 
rate over the course of the trial, the number of errors 
per submission, and the change in data quality per form 
per submission. 

Results 

Between November 2009 and March 2010 a total of 
4860 data items pertaining to 51 patients were entered. 
This is the first five months of an estimated 3 year long 
accrual period. In total 433 eCRF's were submitted, of 
which 329 were unique realizations. Each CRF has been 
submitted between 1 to 4 times. Table 1 presents an 
overview of the submitted eCRF's and data items. 

Of the 433 submitted eCRF realizations, 287 were sub- 
mitted for the first time without primary data errors 
(Table 2), while 253 forms (realizations) were submitted 
for the first time without secondary data errors (Table 
3). No form had more than two errors in the primary 
data. One form contained 10 secondary data errors 
when it was submitted for the first time. This was base- 
line patient registration data for which the wrong 
patient was entered. In general subsequent submissions 
contained fewer errors (Figure 1). 

For example, the "Pathology, Staging & Given treat- 
ment" eCRF contained a total of 89 unique data items, 
with the number of data items per eCRF ranging from 2 
to 18 (Table 1). Of these 89 items, 40 were classified as 
primary data, with the remaining 49 being classified as 
secondary. The error rate at first submission was 3.3% 
for primary data and was 8.4% for secondary data. 

To assess the change in data quality over time, the 
proportion of unsolved errors in primary and secondary 
data were plotted against time (in months). Figure 2(A) 
presents the cumulative number of unique data items 
submitted and the number of unresolved errors over the 
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Table 1 Number of times each form has been entered and the number of data items per form 
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*Number of forms submitted, some unique realizations have been submitted multiple times. 



first five months of the study. Figure 2(B) presents the 
change in the percentage of unresolved errors of pri- 
mary and secondary data items. Although the absolute 
number of unsolved errors is increasing with time (due 
to the accrual of patients), the fraction of erroneous 
data is declining. Five months after the start of study 
the error rate for the primary data items was 1.6% and 
for the secondary data items the error rate was 2.7%. 
Although not quite at the levels appropriate for final 
analysis, the standard of data quality is high, very early 
into the study. 

Discussion 

For this study we found an error rate of 1.6% for pri- 
mary data items, while in earlier studies in the same set- 
ting data could not be analysed because of the massive 
data loss and poor data quality. With this real time data 
monitoring and inbuilt checks we have realized accepta- 
ble levels of data quality. The CTDMS prevents us from 
missing data or ending up with poor quality data at the 
end of the study, which often at that point cannot be 
resolved anymore. 

The presented analysis shows that after five months 
since the introduction of the CTDMS the error rates for 
both Primary and Secondary data items reflect accepta- 
ble levels of data quality. Furthermore these error rates 

Table 2 The number of forms submitted with erroneous 
primary data items at each submission 

Number of errors 



0 1 2 

Submission Number 1 287 31 11 

2 78 9 2 

3 8 3 0 

4 3 10 



For example 287 forms were entered error free at the first submission, while 
four were entered for the fourth time, 3 error free and one with one error. 



were decreasing over time. The drop in errors per form 
with each form submission indicates that, while being 
prompted by the CTDMS, the data manager and 
responsible doctors are actively solving the errors. 
Online warning messages notify the data manager 
(entering data) when errors are detected, allowing them 
to immediately correct the data, rather than the usual 
delay associated with paper based CRFs. 

Clearly, the CTDMS encourages local data managers 
to verify the entered data and, if necessary, ask the doc- 
tor whether the information is correct. It is also likely 
that the reason that data managers have to specify argu- 
ments before submitting the form in case the CTDMS 
detects erroneous data motivates them to verify whether 
the available data is actually correct. This may explain 
why our results show a significant increase of clean data 
and a self-learning curve of the data manager is to be 
expected. Moreover, the error logs provide valuable 
information about the bottlenecks in the treatment of 
the NPC patients. 

In the past authors have pointed out that existing data 
collections in developing countries are often deficient 
[10,11]. Eiseman and Fossum (2005) emphasize that 
existing data collections are insufficiently comprehen- 
sive, sometimes inaccurate, and often out of date by the 
time the data can be acted upon. All point out that 
without these data the required empirical knowledge to 
address the health problems in developing countries is 

Table 3 The number of forms submitted with erroneous 
secondary data items at each submission 
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Figure 1 Errors per submission for the six forms of the nine forms for which there have been more than one submission (per 
realization). 



insufficient. Especially on strategic planning, priority set- 
ting, monitoring and evaluation, advocacy, and general 
policymaking [12-14]. 

These comments supported us on introducing an 
online medical record system which could play an 
important role in improving data collection and data 
quality. Accordingly, during analysis we have also seen 
that treatment procedures are often unsatisfactory. The 
first analysis regarding the treatment of NPC has been 
presented and discussed with all members of the disci- 
plinary team. The main concern was the duration of the 
radiotherapy. According to the protocol the duration for 
administering the 66 gray radiotherapy should take to 
the utmost 42 days, yet analysis showed that the treat- 
ment time takes in average 66 days, which will lead to 
inadequate treatment [15,16]. Future analysis has to 
show if intervention by CTDMS system-based education 
of the doctors will eventually lead to better treatment 
outcome. The digital nature of the CTDMS, as well as 
the online availability of data, gives fast and easy insight 
in adherence to treatment protocols. As such, the 
CTDMS can serve as a tool to directly train and educate 
medical doctors. Therefore, a potential even bigger 
advantage of an online medical record system is the 
ability to monitor the data from the teaching hospitals 
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Figure 2 (A) The cumulative number of unique data items 

entered (black), the number of primary data items entered 

(blue solid) and errors unresolved (blue dashed), the number 

of secondary data items entered (red solid) and errors 

outstanding (dashed lines). (B) The percentage of Primary (blue) 

and Secondary (red) errors unresolved over time. The horizontal 
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especially in developing countries. Via this way the tea- 
chers can communicate directly or visit the participating 
hospitals with a custom fit teaching program, which will 
make such visits more effective. 

Conclusion 

We show that an online clinical data management ser- 
vice can improve data quality in a developing country 
setting. In the future we expect to see both less loss-to- 
follow-up and better treatment programmes with help 
of this CTDMS. For better and more efficient medical 
care programs and studies in developing countries we 
believe an online data system is essential. 



The digital nature of the CTDMS, as well as the 
online availability of that data, gives fast and easy insight 
in adherence to treatment protocols. As such, the 
CTDMS can serve as a tool to train and educate medi- 
cal doctors and can improve treatment protocols. Since 
the introduction of this system training the doctors has 
become much more efficient. 

Appendix 1 

Technical description CTDMS 

The selected CTDMS is ALEA™ 3.0, which uses Micro- 
soft Infopath 2007 for eCRF template design and Share- 
point Enterprise 2007 for rendering the eCRF's in 



Table 4 List of filters/rules which trigger error flags 



Name eCRF 


Data item 


Data validation rule 


Patient 

information/ 

Physical 

examination form 


Date of First Visit 


1. The field Date of First Visit is mandatory and cannot be blank. 

2. The Date of First Visit cannot be after the Date of today 




Date of Birth 


1. The patient Date of Birth cannot be blank 

2. The patient Date of Birth cannot be after the current date. 

3. The Date of birth you entered is before 1-1-1900, the patient is too old to participate in this study 




Date of Signed 
informed consent 


1. The field "Date of Signed informed consent" is mandatory and cannot be left blank 

2. The Date of First Visit cannot be after the Date of today 


1. The field Heart Rate cannot be blank. 

2. The field "heart rate" cannot be more than 220 p/m 

3. The field "heart rate" cannot be less than 40 p/m 




Temperature 


1. Patient Temperature should be between 35 and 41.5 degrees 




WHO performance 
rate" 


1. The field "performance rate" is mandatory and cannot be left blank 

2. During First Visit, the "WHO performance rate" cannot be either 3 or 4. 


Radiology 
Diagnostics form 


CT/MRI- scan Date 
Sample Taken 


1. The field CT/MRI- scan Date Sample Taken is mandatory and cannot be blank. 

2. The Date of sample taken, CT/MRI-scan, cannot be before the Date of Visit, as specified on the 
General Patient Information form (belonging to this Event/Visit). 

3. The Date Sample Taken CT/MRI- scan cannot be before the Date of First Visit. 

4. The Date Sample Taken CT/MRI- scan during PDT Assessment should be about 12 weeks after the 
Date of Foscan Administration and no less than 10 weeks after that date. 

5. The Date of CT/MRI- scan cannot be later than today. 




T-stage 


1. The T-stage (0,1, 2A and 2B,3,4) in the field "CT-MRI T-stage" has to be the same as the T (0,1, 2A and 
2B,3,4) in the field "T-stage" on the Pathology form. 

2. The T- stage cannot be "0" if patient has been included in this study because of a recurrent or 
persistent disease 




field CT/MRI Lesion 
Size 


1. The field CT/MRI Lesion Size (length) cannot be blank when the field T-stage has been entered. 

2. The field CT/MRI Lesion Size (length) cannot be more than 200 mm 




CT/MRI site 


1. The field "CT/MRI site" is mandatory and cannot be left blank 

2. The tumour site CT/MRI- scan selected here differs from the Suspicious site as specified on the 
Endoscopy form during this Visit. Are you sure this is correct? If yes, please provide explanation in 
Audit Trail entry after Validation 

3. The tumour site CT/MRI- scan selected here differs from the Tumour site as specified on the 
Endoscopy form during this Visit. Are you sure this is correct? If yes, please provide explanation in 
Audit Trail entry after Validation 




endoscopy 


1. If Nose endoscopy has been performed, the field Date of Endoscopy cannot be blank 

2. During (visit) Positive Test, the Date of Endoscopy should be after the Date of Endoscopy of the 
PDT Assessment. 

3. The Date of Endoscopy should be after the Date of Endoscopy of the last PDT Follow Up 

4. The Date of Endoscopy should be after the Date of Endoscopy of the PDT Visit(s) 

5. The Date of Endoscopy should be after the Date of Endoscopy of the previous Follow Up visit. 

6. The Date of Endoscopy should be after the Date of Endoscopy of the First Visit. 

7. The Date of Endoscopy should be 3 months after the Date of Endoscopy of the Therapy 
Assessment 


Examples of data validation rules triggering error flags for a subset of the data items on two of the eCRF's. 
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browsers. SQL server 2008 is used for data storage. The 
CTDMS provides a comprehensive eCRF. It uses stan- 
dard browsers running on any computer connected to 
the internet. The system has been validated, and has 
been certified by registered auditors, as being in compli- 
ance with relevant regulations, such as the FDA's CFR 
21 Part 11. 

The CTDMS eCRF design module is based on an 
industry grade enterprise electronic forms system: 
Microsoft Infopath 2007 for form design and Microsoft 
Forms Server 2007 for data entry. The components 
make use of a common standard representation of data 
and metadata: the Operational Data Model of CDISC. 
Within the CTDMS, the components share a database 
for storing and retrieving information about the trial, 
and a separate database for storing and retrieving 
patient data. 

The online Data Management Module of the CTDMS 
is a web browser application that supports online com- 
pletion of eCRF for healthcare studies. It requires initial 
login with a username and password, and provides a 
navigation menu for all trials to which the account has 
been granted access, and the selected investigators for 
which the account has been granted permissions to 
access. Transmission of data is SSL encrypted using 
RSA 1024 bit Public Key encryption. 

Data validation rules were implemented into the 
eCRF's using the tools Microsoft Office InfoPath pro- 
vides, as well as some Xpath expressions. With data vali- 
dation rules implemented, the eCRF automatically 
checks the data as soon as it is entered. If a value does 
not match the specified condition, an error alert pro- 
vides the user with immediate feedback. Moreover, after 
completion of an eCRF, the user is prompted to provide 
an explanation of all data items which raised validation 
errors. This enables users to submit data with validation 
errors, while providing a comprehensive audit trail in 
compliance with requirements from regulatory authori- 
ties. Examples of data validation rules, which trigger 
error flags is provided in Table 4. 

Abbreviations 

EMR: Electronic Medical Record; CTDMS: Clinical Trial Data Management 
service; CRF: Case Report Forms; eCRF: electronic Case Report Forms; NPC: 
Nasopharyngeal Carcinoma; EBV: Epstein Barr Virus. 
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